Skip to content

Add internal URI handling API #19073

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Conversation

kocsismate
Copy link
Member

No description provided.

Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some first remarks. Did not yet look at everything.

Comment on lines 111 to 114
if (uri_handler_name == NULL) {
return uri_handler_by_name("parse_url", sizeof("parse_url") - 1);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting to parse_url in a new API is probably not a good idea. Instead the “legacy” users should just pass "parse_url" explicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting to parse_url here works because that's the default indeed where php_uri_get_handler() is called, the other "backends" can only be used if the config is explicitly passed (not null).

The other reason why I opted for this approach is that it would be inconvenient to create and free a new zend_string when the legacy implementation is needed, and I wanted to avoid adding a known string just for this purpose, or exposing the C string based uri_handler_by_name function instead.

Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked at this again and I must say that I'm having trouble meaningfully reviewing this. It adds a large amount of code with unclear purpose and confusing (to me) naming.

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preliminary review round

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The switch from zend_string to the pointer-length pair seems to have been a good idea

@TimWolla TimWolla self-requested a review July 22, 2025 20:06
@TimWolla
Copy link
Member

I'll try to take another look tomorrow.

@kocsismate
Copy link
Member Author

@nielsdos Do you see anything that I should fix before merging it? I'd like to implement some of the cleanups that we discussed

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good, FWIW I agree with Tim on UNREACHABLE

return property_handler->read_func(internal_uri, read_mode, zv);
zend_result result = property_handler->read_func(internal_uri, read_mode, zv);

ZEND_ASSERT(result == FAILURE || (Z_TYPE_P(zv) == IS_STRING && GC_REFCOUNT(Z_STR_P(zv)) == 2) || Z_TYPE_P(zv) == IS_NULL || Z_TYPE_P(zv) == IS_LONG);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assertions intends to make sure that only a new string, null, or long can be returned in case of success. I hope that you won't find it too obtrusive.

ZEND_ASSERT(uri_handler->parse_uri != NULL);
ZEND_ASSERT(uri_handler->clone_uri != NULL);
ZEND_ASSERT(uri_handler->uri_to_string != NULL);
ZEND_ASSERT(uri_handler->clone_uri != NULL || strcmp(uri_handler->name, URI_PARSER_PHP) == 0);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep these assertions even though two handlers of the legacy parser became NULL. So I excluded this URI handler from the checks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertions with exceptions to the rule are a bit iffy IMHO

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can understand your POV. Do you think it's better to get rid of the latter two special cased assertions?


ZEND_ASSERT(uri_handler->name != NULL);
ZEND_ASSERT(uri_handler->name != NULL && (strlen(uri_handler->name) > 0 || strcmp(uri_handler->name, URI_PARSER_PHP) == 0));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I renamed the legacy implementation to "", I wanted to make sure that other implementations cannot use the same handler.

Although, I think this assertion doesn't make any sense. Especially because module initialization will surely fail because of the name collision.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh? Why did you change the name to "" though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly because I wanted to avoid this to work:

filter_var("qwe", FILTER_VALIDATE_URL, ["uri_parser_class" => "parse_url"])

even though null is the suggested value to use for the "uri_parser_class" config. As "" is the most similar string value to null that can be stored in a uri_handlers hash table, I figured it's a better choice than parse_url.

Or do you have any better solutions in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "" is weird from a user point of view.
I still don't think I quite understand the problem though.

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is getting pretty close to being merge-able


ZEND_ASSERT(uri_handler->name != NULL);
ZEND_ASSERT(uri_handler->name != NULL && (strlen(uri_handler->name) > 0 || strcmp(uri_handler->name, URI_PARSER_PHP) == 0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh? Why did you change the name to "" though?

ZEND_ASSERT(uri_handler->parse_uri != NULL);
ZEND_ASSERT(uri_handler->clone_uri != NULL);
ZEND_ASSERT(uri_handler->uri_to_string != NULL);
ZEND_ASSERT(uri_handler->clone_uri != NULL || strcmp(uri_handler->name, URI_PARSER_PHP) == 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertions with exceptions to the rule are a bit iffy IMHO

@kocsismate
Copy link
Member Author

I'll fix the CI probably tomorrow night

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants